EE 511 Simulation Methods for Stochastic Systems

Project #1: Sampling and Waiting



In [15]:

    
using Distributions;
using Gadfly;
srand(1000);

[Adding Coins…]



In [2]:

    
## Write a routine to simulate a fair
## Bernoulli trial in your language of choice. 
function simulate_bernoulli(p, N_trials)
    rand(Bernoulli(p), N_trials)
end;

## Write a routine to count the number
## of successes in 5 fair Bernoulli trials.
function count_successes_5_bern()
    sum(rand(Bernoulli(0.5), 5))
end;

## Write a routine to count the number of
## trials before the first successful Bernoulli trial. 
function count_before_success()
    trials = 0 
    while rand(Bernoulli()) < 1
        trials+=1
    end
    return trials
end;



In [3]:

    
## Generate a histogram for 100
## simulated Bernoulli trials
trials = simulate_bernoulli(0.5, 100);
plot(x=trials, Geom.histogram,
     Guide.xlabel("Number of successes"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of number of successes in one bernoulli trial"))









    Out[3]:

As evident from the histogram, the bars at 0 and $1$ are almost equiheight, the distribution resembles a fair bernoulli trial($p=0.5$)

Number of successes in 5 fair Bernoulli trials

Number of successes in 5 bernoulli trials is distributed as Binomial(5,0.5) and with mean = $np = 2.5$



In [4]:

    
## Generate a histogram for 100
## samples of this counting random.
trials = [count_successes_5_bern() for i=1:100];
plot(x=trials, Geom.histogram, 
     Guide.xlabel("Number of successes"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of number of successes in five bernoulli trials"))









    Out[4]:

Number of trials before the first successful Bernoulli trial

Let $X$ represent the associated random variable. If the number of trials before first success be $k$, so for $k-1$ trials the bernoulli trial is a failure.

Then, $P(X=k)=(1-p)^{k-1}p$ $k \in \{1,2,\dots,\}$ i.e. $X$ is a geometric random variable with mean $\frac{1}{p}=2$



In [5]:

    
## Generate a histogram for 100
## samples of this counting random variable.
trials = [count_before_success() for i=1:100];
plot(x=trials, Geom.histogram, 
     Guide.xlabel("Number of trials"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of number of trials before first success"))









    Out[5]:

[Coin Limits]

Take your Bernoulli success counting routine. Generate and sum k=2 samples from this routine. Generate 300 such sums and histogram your results. Repeat for k={5, 10, 30, 50}. Compare your histograms to a bell curve. How do you justify your observations?

Note: Instead of generating the sum, I generate the mean here since it makes it easier to compare for different $k$



In [6]:

    
function count_successes_5_bern_sumk(k)
    sums = 0
    for i=1:k
        sums+= count_successes_5_bern()
    end
    return sums
end;

k=2



In [7]:

    
k = 2
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
    Guide.xlabel("Mean"),
    Guide.ylabel("Frequency"),
    Guide.title("Distribution of mean of k=2 trials before first success"))









    Out[7]:

k=5



In [8]:

    
k = 5
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
     Guide.xlabel("Mean"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of mean of k=5 trials before first success"))









    Out[8]:

k=10



In [9]:

    
k = 10
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
     Guide.xlabel("Mean"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of mean of k=10 trials before first success"))









    Out[9]:

k=30



In [10]:

    
k = 30
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
     Guide.xlabel("Mean"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of mean of k=30 trials before first success"))









    Out[10]:

k=50



In [11]:

    
k = 50
sums_samples = [count_successes_5_bern_sumk(k)/k for i=1:300]
plot(x=sums_samples, Geom.histogram,
     Guide.xlabel("Mean"),
     Guide.ylabel("Frequency"),
     Guide.title("Distribution of mean of k=30 trials before first success"))









    Out[11]:

As $k$ increases the distribution gets closer to a bell curve centered at the population mean $\mu=2.5$, as follows from the central limit theorem.

[Bootstrap]



In [12]:

    
data = readcsv("data/NJGAS.dat");
μ = mean(data);

Mean of data: $\mu = 97.167$



In [13]:

    
N_bootstraps = 1000;
n_samples_to_draw = 20;
bootstrap_mean = sort([mean(sample(data, n_samples_to_draw, replace=true)) for i=1:N_bootstraps]);
bootstrap_quantiles = quantile(bootstrap_mean,[0.25, 0.75]);



In [14]:

    
print(bootstrap_quantiles)









    



Any[79.4875,113.9]